Genomics of the “tumorigenes” clade of the family Rhizobiaceae and description of Rhizobium rhododendri sp. nov

Abstract Tumorigenic members of the family Rhizobiaceae, known as agrobacteria, are responsible for crown and cane gall diseases of various crops worldwide. Tumorigenic agrobacteria are commonly found in the genera Agrobacterium, Allorhizobium, and Rhizobium. In this study, we analyzed a distinct “tumorigenes” clade of the genus Rhizobium, which includes the tumorigenic species Rhizobium tumorigenes, as well as strains causing crown gall disease on rhododendron. Here, high‐quality, closed genomes of representatives of the “tumorigenes” clade were generated, followed by comparative genomic and phylogenomic analyses. Additionally, the phenotypic characteristics of representatives of the “tumorigenes” clade were analyzed. Our results showed that the tumorigenic strains isolated from rhododendron represent a novel species of the genus Rhizobium for which the name Rhizobium rhododendri sp. nov. is proposed. This species also includes additional strains originating from blueberry and Himalayan blackberry in the United States, whose genome sequences were retrieved from GenBank. Both R. tumorigenes and R. rhododendri contain multipartite genomes, including a chromosome, putative chromids, and megaplasmids. Synteny and phylogenetic analyses indicated that a large putative chromid of R. rhododendri resulted from the cointegration of an ancestral megaplasmid and two putative chromids, following its divergence from R. tumorigenes. Moreover, gene clusters specific for both species of the “tumorigenes” clade were identified, and their biological functions and roles in the ecological diversification of R. rhododendri and R. tumorigenes were predicted and discussed.


| INTRODUCTION
The family Rhizobiaceae contains genetically and phenotypically diverse bacteria isolated from various environments. Accordingly, Rhizobiaceae members exhibit remarkably diverse lifestyles, ranging from plant symbionts (rhizobia) and pathogens (agrobacteria) to opportunistic human pathogens, to free-living species in soils, sediments, and water (Carareto Alves et al., 2014). In this respect, the general term "agrobacteria" refers to a polyphyletic group of Rhizobiaceae lineages that can cause neoplastic diseases in plants (de Lajudie et al., 2019).
Agrobacteria are remarkable plant pathogens, as the infection process represents an interkingdom genetic exchange involving the integration of a fragment of bacterial plasmid DNA (transferred DNA or T-DNA) into plant host cells (Gelvin, 2017).
Consequently, agrobacteria cause crown and cane gall (Escobar & Dandekar, 2003;Puławska, 2010), and hairy root (Bosmans et al., 2017) diseases, depending on whether they carry a tumor-inducing (Ti) or root-inducing (Ri) plasmid. Hence, Ti and Ri plasmids code for functions essential for pathogenicity. Ti and Ri plasmids are related, although the former plasmid group has been studied more extensively. Ti plasmids are transmissible and self-conjugal (reviewed in Farrand, 1998). However, the natural host range of Ti plasmids is relatively narrow and restricted to members of the family Rhizobiaceae. To date, strains carrying Ti plasmids and able to cause crown gall and hairy root diseases (agrobacteria) have been primarily identified within the genera Agrobacterium, Allorhizobium, and Rhizobium. Additionally, a Neorhizobium strain carrying a Ti plasmid and able to cause tumors on multiple host plants was identified recently (Haryono et al., 2018). Historically, Rhizobium rhizogenes (i.e., Agrobacterium biovar 2/Agrobacterium rhizogenes) was the only tumorigenic Rhizobium species. However, another member of this genus, Rhizobium tumorigenes, was recently isolated from cane gall tumors on thornless blackberries (Kuzmanović et al., 2018). In addition, genomic analyses now suggest that the tumorigenic strain AB2/73 (Anderson & Moore, 1979), initially identified as a biovar 2 strain (R. rhizogenes) (Unger et al., 1985), actually belongs to a novel, so far undescribed Rhizobium species (Hooykaas & Hooykaas, 2021).
In our previous work, we identified a novel group of tumorigenic agrobacteria associated with crown gall disease of rhododendron . Phylogenetic and genomic analyses suggested that these strains are most closely related to R. tumorigenes, but represent a separate species. Collectively, we named this distinct Rhizobium clade comprising R. tumorigenes and novel rhododendron strains as "tumorigenes." In this study, we generated high-quality, closed genomes of representatives of the "tumorigenes" clade, and performed thorough comparative genomic and phylogenomic analyses. Moreover, we phenotypically characterized rhododendron strains and described them as a novel species, Rhizobium rhododendri.

| Bacterial strains
Rhizobium strain rho-6.2 T ( = DSM 110655 T = CFBP 9067 T ) used in this study was isolated in 2017 from crown gall tumors on rhododendron originating from a nursery in Lower Saxony, Germany . In addition, we used R.
Cultures were stored in a −80°C freezer in nutrient broth with 20% glycerol for long-term preservation.
Oxidase activity was tested by the method of Kovacs (1956).
Catalase tests were performed by mixing freshly grown bacterial cells with 10% H 2 O 2 , followed by an examination of gas bubble formation. Growth at 5,10,15,20,25,30,35,and 40°C was determined in R2A broth for up to 9 days. Tests for 3-ketolactose production and acid clearing on PDA-CaCO 3 were performed as described before (Moore et al., 2001). Additionally, the strains rho-6.2 T and 1078 T were phenotypically characterized using the API 20NE system (bioMérieux) following the instructions provided by the manufacturer.
For the FAME analysis, strains were cultured on R2A medium at 25°C for 3 days. The cellular fatty acids were analyzed using the Microbial Identification System (MIDI; Sherlock version 6.1, TSBA40 method), according to instructions provided by the manufacturer (Sasser, 1990). Combined analysis by gas chromatography coupled to a mass spectrometer was used to confirm the identity of the fatty acids based on retention time and mass spectral data (Vieira et al., 2021).

| DNA extraction
Genomic DNA was extracted from bacterial strains using a Qiagen Genomic DNA Buffer Set (Qiagen;Cat. No. 19060) and Qiagen genomic tip 100/G gravity-flow, anion exchange columns (Cat. No. 10243). The purity and approximate concentration of DNA were determined by spectrophotometry using the Nano-Drop instrument. Genomic DNA integrity was assessed by agarose gel electrophoresis.

| Eckhardt-type gel electrophoresis
The plasmid content of Rhizobium strains rho-6.2 T , 1078 T , and 932 was analyzed by the modified method of Eckhardt (1978).
This method can also allow visualization of other extrachromosomal replicons, such as smaller chromids. Separation and visualization of replicons were performed in a 0.7% (w/v) agarose gel (5 mm thick) prepared in 1× Tris-borate-EDTA (TBE) buffer using the following procedure. Bacteria were grown in TY medium for 24 h at 28°C. Approximately 0.5-1 mL of bacterial culture was centrifuged at 8000 rcf (g) for 10 min, and the pellet was resuspended in 0.5 mL sterile distilled water. One milliliter of 0.3% (m/v) sodium lauroyl sarcosinate was added, after which the cell suspension was gently mixed and centrifuged at 8000 rcf for 5 min. The pellet was resuspended in 40 µL 20% Ficoll 400 (w/v) in TE buffer (10 mmol L -1 Tris-HCl, 1 mmol L -1 EDTA, pH 8.0) and samples were incubated for 15 min on ice. An agarose gel was prepared during the previous incubation steps by loading 25 μL of 10% sodium dodecyl sulfate (SDS; w/v) into empty wells, followed by gently flooding the gel with 1× TBE buffer and running electrophoresis at 4 V cm −1 for 15 min from positive to negative polarity (opposite direction to standard DNA gel electrophoresis). Next, 10 µL lysing solution in TE buffer containing 0.4 mg mL -1 RNase A, 1 mg mL -1 bromophenol blue, and 1.5 mg mL -1 lysozyme (freshly prepared aqueous solution) was added to each cell sample after incubation on ice. A 30 µL aliquot of the mixture was loaded immediately into wells in the gel. Electrophoresis was run first at 1.5 V cm -1 for 1 h and then at 4 V cm -1 for 20 h (standard DNA gel electrophoresis from negative to positive polarity). The gel was stained in ethidium bromide solution (1 μg/mL) and the plasmids were visualized under UV light. As markers, "Agrobacterium fabrum" C58 T , Allorhizobium ampelinum S4 T , and R. rhizogenes K84 carrying replicons of known size were used.

| Illumina library preparation and sequencing
Libraries for Illumina sequencing were prepared using a Nextera XT DNA Library Preparation Kit (Illumina) with modifications according to Baym et al. (2015). Genome sequencing of strains 1078 T and 932 was performed using an Illumina NextSeq 500 platform in PE75 mode. For strain rho-6.2 T , paired-end 151 bp reads previously generated on an Illumina NextSeq 500 platform  were used for error correction of the PacBio assembly (see below). 2.7 | Genome assembly, error-correction, and annotation SMRT Cell data were assembled using the "RS_HGAP_Assembly.3" protocol included in SMRT Portal version 2.3.0 using default parameters. Assembled replicons were circularized and adjusted to dnaA (chromosomes) or repA (chromids and megaplasmids) as the first gene.

| Classification, synteny, and phylogeny of DNA replicons
Bacterial replicons were classified using an approach similar to that described previously (diCenzo & Finan, 2017;diCenzo et al., 2019), except that a size threshold was not used in defining megaplasmids or chromids (Hall et al., 2022). The largest replicon in a genome was classified as the chromosome. The remaining replicons were considered putative chromids if both their %GC content and dinucleotide relative abundance (DRA) distance differed by not more than approximately 1% and 0.4%, respectively, compared to the chromosome. DRA distances were computed as described by diCenzo and Finan (2017). The replicons that failed to meet one of the two criteria (%GC-and DRA distance-based) for chromid classification were further analyzed through comparative genomic and phylogenetic analysis to reconstruct their evolutionary history, as described in the following paragraphs. The remaining replicons that did not meet any of the above-mentioned criteria were classified as megaplasmids.
Synteny between genomes of the "tumorigenes" clade was explored using circos. First, blast bidirectional best hits (Blast-BBHs) were identified using BLASTn version 2.10.1+ (Camacho et al., 2009) and a custom Matlab script, limiting Blast-BBHs to those with pairs where at least 50% of each protein was aligned with e-values ≤1e −100. The parsed output was then used to prepare a "links" file, which was provided to circos 0.69-8 (Krzywinski et al., 2009). Furthermore, BRIG (BLAST Ring Image Generator) program version 0.95 (Alikhan et al., 2011) was used for a visual representation of replicons of strain rho-6.2 T (reference sequences) with the orthologous replicons of related strains (query sequences). The BRIG analysis was done by using the BLASTn option.
To assess the evolutionary relationships among the extrachromosomal replicons of the "tumorigenes" clade, phylogenetic analysis based on the RepA and RepB protein sequences was conducted.
Model selection was conducted using IQ-TREE ModelFinder (Kalyaanamoorthy et al., 2017) based on Bayesian Information Criterion (BIC) (Schwarz, 1978). Branch supports were assessed by ultrafast bootstrap analysis (UFBoot) (Hoang et al., 2017) and the SH-aLRT test (Guindon et al., 2010)  To examine potential relationships between the extrachromosomal replicons of the "tumorigenes" clade and other members of the family Rhizobiaceae, a previously described pipeline was adapted (diCenzo et al., 2019). Shortly, putative RepA proteins were identified in each of the Rhizobiaceae proteomes using the hmmsearch function of HMMER version 3.3 (Eddy, 2009) and the Pfam ParA hidden Markov model (HMM). All hits were then searched against the complete Pfam version 34.0 and TIGERFAM version 15.0 databases (Finn et al., 2016;Haft et al., 2013), and proteins were classified as RepA if the top was either the ParA (Pfam) or TIGR03453 (TIGRFAM) HMM. All RepA proteins were aligned with MAFFT version 7.471 (Katoh & Standley, 2013) with the "localpair" option and then trimmed with trimAl version 1.4.rev22 (Capella-Gutiérrez et al., 2009) with the "automated1" option. A maximum likelihood phylogeny was constructed using RAxML version 8.2.12 (Stamatakis, 2014) with the LG amino acid substitution model with empirical base frequencies and the final tree represents the bootstrap best tree following 500 bootstrap replicates. In addition, RepA proteins were clustered using CD-HIT version 4.8.1 (Li & Godzik, 2006) with a 90% identity threshold.

| Genome-based phylogenetic analyses
The data set comprised 119 genomes, including 116 Rhizobiaceae strains and three Mesorhizobium spp. that were used as an outgroup (  (Vinuesa et al., 2018) as described before (Kuzmanović, Biondi, et al., 2022). For core-genome-based phylogenetic analyses, the latter pipeline was run using both DNA and protein sequences, thus generating core-genome and coreproteome phylogenies, respectively. The protein alignment generated by the cpAAI pipeline (see below) was also used as input for phylogenetic analysis. An ML phylogeny was inferred under the best-fitting substitution model by employing IQ-TREE Version 2.1.3 (Nguyen et al., 2015) and ModelFinder (integrated into IQ-TREE) (Kalyaanamoorthy et al., 2017), following the same approach as implemented in the GET_PHYLOMARKERS package.

| Genome and proteome-relatedness indices
For the calculation of genome and proteome relatedness indices, we used the same data set as for phylogenetic analysis (see above; reference strains, using the prealigned reference protein files (option -A) as described in Kuzmanović, Fagorzi, et al. (2022).
Nucleotide FASTA files of all CDSs predicted by Prokka were used as input files. The cpAAI values were computed from the resulting alignment using a custom R script (see https://github.com/flass/ cpAAI_Rhizobiaceae) that relied on the "dist.aa" function from the "ape" package (Paradis & Schliep, 2018). Additionally, we calculated cpAAI values using the core protein markers inferred from the 119 strains included in the present study, which were identified and selected using the GET_HOMOLOGUES and GET_PHYLOMARKERS tools, respectively (see above).
Heatmaps representing genome (OGRIs) and proteome (cpAAI and wpAAI) relatedness values were generated and plotted onto the reference core-proteome phylogenetic tree by the "phylo.heatmap" function in the R package phytools (Revell, 2012).

| Identification of species-specific genes
To identify genes specific for each of the two species comprising the "tumorigenes" clade, the pan-genome of the "tumorigenes" clade was explored. The data set included two R. tumorigenes strains (1078 T and 932) and 12 strains comprising the new species R. rhododendri (see below; Table A1, https://doi.org/10.6084/m9. figshare.22193809). The analysis was performed using the GET_HOMOLOGUES software and its auxiliary scripts as described before (Kuzmanović, Biondi et al., 2022).
To determine if the function of the species-specific gene or gene cluster of interest is compensated by isoenzymes or by a divergent homologous gene(s) in the other species, we performed BLASTp (Johnson et al., 2008) comparisons and examined annotations of pangenome genes of species comprising the "tumorigenes" clade that were generated by GhostKOALA (Kanehisa et al., 2016) and eggNOG-mapper (Cantalapiedra et al., 2021).

| Genome sequences and rRNA operon diversity
The finished genome sequences of three representative members of the Rhizobium clade "tumorigenes" (rho-6.2 T , 1078 T , and 932) were generated using a combination of long-(PacBio) and short-read (Illumina) sequencing technologies (see Table A2 Figure A1). The total genome size of the three strains was similar, ranging from 5.96 to 5.98 Mb (Table 1). The GC content was approximately 60% for all strains (Table 1).
For all three strains, four rRNA operons (5S, 16S, and 23S rRNA) were identified on the largest replicon. Unlike strains rho-6.2 T and 1078 T , we did not observe intragenomic heterogeneity between multiple rRNA operons in strain 932. In strain rho-6. (1078 T and 932) and differed by 10 SNPs from those of strain rho-6.2 T .

| Genome organization
Whole-genome sequencing revealed that all strains in the "tumorigenes" clade contain multipartite genomes ( Figure 1 and Table 2). The largest replicon in all sequenced genomes, which also carried all four copies of the rRNA operon, was classified as the chromosome. Chromosomes were highly conserved across all strains of the "tumorigenes" clade ( Figure 2 and Appendix: Figure A2a).
In accordance with our previous work demonstrating the pathogenicity of the "tumorigenes" clade ( (198,177) that comprises genes coding for an unknown opine. The pTi1078 plasmid also carried multiple copies of the vir genes (virDE,coordinates 123,560;virBGCD,190,079;virDCGB,348,220;and virA,373,237), and two 800;and 257,612). The first T-DNA includes genes encoding synthesis of the opines agrocinopine and nopaline, and the second contains genes associated with the production of an unknown opine. The pTi932 plasmid was closely related to pTi1078, and their vir and T-DNA regions were highly similar or identical.
Likewise, a 25-gene cluster of putative chromid 3 of strain 1078 was found on putative chromid 1 of strain 932 (Figure 2a). Based on the location of these gene clusters in the more distantly related strain rho-6.2 T (Figure 2b), the observed translocations likely occurred in the lineage leading to strain 1078 T following divergence from strain 932.
Unlike R. tumorigenes strains 1078 T and 932 which carried five extrachromosomal replicons, strain rho-6.2 T carried only three: a Ti plasmid and two putative chromids. Surprisingly, synteny analysis suggested that putative chromid 1 of rho-6.2 T resulted from the cointegration of an ancestral megaplasmid (orthologous to pRt1078 of strain 1078 T ) and two putative chromids (orthologous to putative chromids 1 and 3 of 1078 T ) ( Figure 2b). Consistent with the proposed cointegration scenario, the cointegrant of strain rho-6.2 T (putative chromid 1) contains three repABC cassettes, which are orthologous to the repABC cassettes of pRt1078 and putative chromids 1 and 3 of strain 1078 T (Appendix: Figure A3). However, only one repC copy on the cointegrant is complete, with the other two copies appearing to be truncated and thus nonfunctional. The orthologous cointegrant replicon was also carried by other R.
Putative chromid 2 of rho-6.2 T displayed high conservation with putative chromid 2 of strain 1078 T (Appendix: Figure A2c), whereas poor conservation was observed when comparing the Ti plasmids of these strains (Appendix: Figure A2b). Orthologous replicons of putative chromid 2 are also present in other R.
rhododendri strains, with all exhibiting a high degree of synteny (Appendix: Figure A2c). On the other hand, the RepA proteins of each of the three R. tumorigenes chromids formed their own cluster in the RepA phylogeny (Appendix: Figure A4) and they shared less than 92% identity with all other RepA proteins from the family Rhizobiaceae. A comparison of the R. tumorigenes chromids with the most closely related replicons from other Rhizobiaceae species using D-Geneies (Cabanettes & Klopp, 2018) identified no obvious stretches of synteny. Overall, these results suggest that the three chromids of R. tumorigenes are specific to the "tumorigenes" clade.
We could not identify genes associated with mobilization or conjugation on the chromids of strains 1078 T and 932. On the other hand, megaplasmids pRt932 and pRt1078 carried gene clusters involved in conjugative transfer, including genes coding for conjugative relaxase (traA), coupling protein (traG), and T4SS proteins (trb). Likewise, the large cointegrant of rho-6.2 T carries genes for conjugation. Interestingly, however, these genes were divergent from those carried on pRt932 and pRt1078. For instance, the VirB4 protein sequences of pRt1078 and the cointegrant of rho-6.2 shared only 40.8% identity.
T A B L E 1 General features of the genome sequences obtained in this study.

| Phylogeny of the clade "tumorigenes"
The core genome of the 119 strains included in the analysis was identified using GET_HOMOLOGUES and comprised 364 homologous gene clusters. Phylogeny was inferred from 253 DNA and 191 protein markers that were selected using the GET_PHYLOMARKERS software. Moreover, we also inferred phylogeny from the protein alignment outputted by the cpAAI pipeline, which represented the concatenated sequence of a reference set of 169 protein markers.
Although the original data set included 170 protein markers, one marker gene was missing in Onobrychidicola muellerharveyae TH2 T , and we, therefore, excluded this marker from the analysis.
All the resulting phylogenies were highly congruent, showing almost identical phylogenetic relationships between Rhizobiaceae genera and major Rhizobiaceae clades (data not shown). The only difference was the position of the genus Xaviernesmea, which was an (a) F I G U R E 1 Circular maps of the complete genomes of strain Rhizobium rhododendri rho-6.2 T (a), and strains of Rhizobium tumorigenes 1078 T (b) and 932 (c). Each replicon is presented by a circular plot containing five rings. Genetic coordinates of the reference sequences are shown within the thin inner ring. The next two rings portray GC content (black ring) and GC skew (purple/green). The next ring shows core (red) and species-specific (blue) genes. Core genes (364) were identified from a data set of 119 strains using GET_HOMOLOGUES software. The accessory genes (R. rhododendri vs. R. tumorigenes) were identified with the same software. The outermost ring highlights prophage regions identified with PHASTER (intact prophages are shown in green and incomplete in orange) and insertion sequence (IS) elements identified using ISEscan (shown in gray). As in some cases IS elements were identified within the prophage regions, borders of the latter regions are highlighted with the corresponding color. The Figure was generated using BRIG software and edited with Inkscape (see M&M for details). and Appendix: Figure A5). R. tumorigenes (1078 T and 932) and strains isolated from rhododendron in Germany (rho-6.2 T , rho-1.1, and rho-13.1) clustered within two sister subclades in the clade we previously defined as "tumorigenes" (Kuzmanović et al., 2019) ( Figure 3 and Appendix: Figure A5). The subclade containing the three rhododendron strains also included nine other Rhizobium strains whose genomes were available in GenBank. The rhododendron clade could be further divided into two clusters. The first cluster comprised our three rhododendron strains and strain L51/94 isolated from blueberry in Oregon (USA), while the second clade consisted of eight Rhizobium strains isolated from Himalayan blackberry in Oregon (Weisberg et al., 2022). The "tumorigenes" clade falls within the socalled core Rhizobium species complex, although it was distantly related to other Rhizobium species. R. tubonense was the closest relative of the "tumorigenes" representatives, while other Rhizobium species grouped within the "tropici-rhizogenes" clade and the more (b) F I G U R E 1 Continued distantly related "leguminosarum-etli" clade ( Figure 3 and Appendix: Figure A5).
The ML pan-genome phylogeny was estimated from a presence/ absence matrix of 71,538 orthologous gene clusters. All Rhizobiaceae genera and major clades were resolved on the resulting tree ( Figure 4 and Appendix: Figure A6), although their phylogenetic relationships differed from that determined by the core-proteome phylogeny ( Figure 3 and Appendix: Figure A5). Nevertheless, the pan-genome phylogeny also contained the same two subclades within the "tumorigenes" clade: one comprising R. tumorigenes, and another with the rhododendron strains and those whose genomes were retrieved from the GenBank.

| Species delineation
For species delineation, we relied on ANI and dDDH computations.
The threshold for species delineation was set at~95%-96% for ANI (Richter & Rosselló-Móra, 2009), consistent with previous recommendations (Goris et al., 2007). As for the conventional version of DDH, the generally accepted species boundary for dDDH values is 70% (Meier-Kolthoff et al., 2013;Stackebrandt & Goebel, 1994). In this study, delineations of strains achieved by ANIb, OrthoANIu, FastANI, and dDDH were highly congruent. Several differences observed are discussed below. In any case, the OGRIs were consistent for the strains of the "tumorigenes" clade, which are the (c) | 9 of 29 primary subject of this study ( No. PRJNA762915) that also belong to the species R. rhododendri, based on ANIb comparisons (>99% ANI with rho-6.2 T ). These strains were isolated from cane galls of blueberry in Oregon (USA) in 2019.
However, as these genomes were unpublished, they were not included in further analyses.
Phenotypic characteristics of strains R. rhododendri rho-6.2 T and R. tumorigenes 1078 T are listed in Table A5 at https://doi.org/10. 6084/m9.figshare.22193809. As expected, these two strains showed almost identical phenotypic characteristics, and we were unable to identify clear differential characteristics. For the strain rho-6.2 T , phenotypic characteristics are summarized in the protologue for the new species R. rhododendri (see below).
The results of the fatty acid analysis are summarized in Table A6 at https://doi.org/10.6084/m9.figshare.22193809. Similar to the other phenotypic characteristics that were measured, strains R.

Interestingly, our results also suggested the existence of a new
Rhizobium species within the clade "tropici-rhizogenes" that is associated with crown gall disease. This potential new species included strains 17-2069-2b and 17-2069-2c isolated from blackberry, which were reported to carry Ti plasmids (Weisberg et al., 2022).

| Genus demarcation
Differentiation of Rhizobiaceae strains at the genus level was conducted using cpAAI and wpAAI indices (  (Kuzmanović, Fagorzi, et al., 2022). As noted above, one marker was missing in O. muellerharveyae TH2 T , and thus the comparison was based on 169 marker proteins. We used a cpAAI threshold of~86%, combined with the core-proteome phylogeny shown in Figure 3 and Appendix: Figure A5, in considering genus delineation. As expected, genus demarcations ( Figure 5) were generally consistent with our previous study (Kuzmanović, Fagorzi, et al., 2022). However, the present data set included a larger number of strains belonging to the core Rhizobium superclade compared to our previous analysis.

| R. rhododendri
Based on the pan-genome analysis, 272 genes specific to R. rhododendri (Rr-specific) were identified. These genes were present in all 12 strains of R. rhododendri and absent in both R. tumorigenes strains. More than half (138) of these genes were located on putative chromid 1. Of the remaining genes, 108 were located on the chromosome, 27 on putative chromid 2, and 1 on pTi6.2 (Figure 1a). Most of the Rr-specific genes were annotated as hypothetical proteins or their function could not be clearly determined (  homologous, but divergent gene clusters with the same predicted function were also present in both R. tumorigenes strains (At1078_ 03513-At1078_ 03517 in strain 1078 T ), and in another copy in strain rho-6.2 T (Rr62_03520-Rr62_03524). All putative gene clusters putatively associated with cellulose synthesis were located on chromosomes.

| R. tumorigenes
Exploration of the pan-genome of the "tumorigenes" clade resulted in 322 genes that are specific to R. tumorigenes (Rt-specific), meaning they are present in both R. tumorigenes strains and absent from all R.
rhododendri strains. In strain 1078 T , one of these 322 genes was present in three copies, while there were two copies of five other genes. In strain 932, nine genes were present in two copies. F I G U R E 2 Synteny analysis of the genomes of the "tumorigenes" clade representatives. The Rhizobium tumorigenes 1078 T genome was compared with the genome of R. tumorigenes 932 (a) and Rhizobium rhododendri rho-6.2 T (b). Putative orthologous genes between strains were identified by performing BLAST bidirectional best-hit analyses using the proteomes. BLAST bidirectional best hits with an E value of ≤1 × 10 − 100 and ≥50% identity were linked to the corresponding gene, and their position was mapped on the genome. Each putative ortholog between genomes is connected by a line and color-coded based on the location of the gene in the R. tumorigenes 1078 T genome. The Figure was generated using circos software and edited with Inkscape (see M&M for details).
| 11 of 29 (see subsection "Genome organization"). As for the Rr-specific genes, the function of the majority of the Rt-specific genes could not be precisely determined ( Avi_1489), which was previously described (Herlache et al., 1997). On the other hand, orthologous protein sequences showing relatively high amino acid identity (>74%) with the polygalacturonase protein sequence F I G U R E 3 Maximum-likelihood core-proteome phylogeny showing the evolutionary relationships between and within the clade "tumorigenes" and other Rhizobiaceae clades (partly collapsed). Three Mesorhizobium spp. strains were included as the outgroup to root the tree. The phylogeny was estimated from the concatenated alignments of 191 protein sequences selected as top-scoring markers using the GET_ PHYLOMARKERS software. The numbers on the nodes indicate the approximate Bayesian posterior probabilities support values (first value) and ultra-fast bootstrap values (second value), as implemented in IQ-TREE. The scale bar represents the number of expected substitutions per site under the best-fitting LG + F + R6 model. The same tree, but without collapsing clades, is presented in Appendix: Figure A5.
of strain 1078 T were identified in various members of Agrobacterium clade "rubi," for example, in Agrobacterium vaccinii (84.12% amino acid identity) .

| Novel insights into the taxonomic diversity of agrobacteria
In this study, we conducted polyphasic characterization of "tumorigenes" clade representatives and described a novel species R. rhododendri (see below the protologue). The species R. rhododendri comprised tumorigenic strains isolated from aerial tumors on rhododendron , but also additional strains originating from blueberry and Himalayan blackberry in Oregon (USA) (Weisberg et al., 2022) whose genome sequences were retrieved from GenBank.
R. rhododendri represents an additional Rhizobium species associated with crown/cane gall disease, which further expands our understanding of the taxonomic diversity of agrobacteria. By searching the NCBI GenBank (nr/nt and wgs databases), we could not identify additional strains belonging to the "tumorigenes" clade. Nevertheless, we assume that members of this clade are distributed more widely and their genetic diversity remains to be explored.

F I G U R E 4
Maximum-likelihood pan-genome phylogeny showing the relationships between and within the clade "tumorigenes" and other Rhizobiaceae clades (partly collapsed). Three Mesorhizobium spp. strains were included as the outgroup to root the tree. The tree was estimated with IQ-TREE from the consensus (COGtriangles and OMCL clusters) gene presence/absence matrix containing 71,538 clusters obtained using GET_HOMOLOGUES software. The numbers on the nodes indicate the approximate Bayesian posterior probabilities support values (first value) and ultra-fast bootstrap values (second value), as implemented in IQ-TREE. The scale bar represents the number of expected substitutions per site under the best-fitting GTR2 + FO + R8 model. The same tree, but without collapsing clades, is presented in Appendix: Figure A6.
| 13 of 29 Apart from R. rhizogenes and the "tumorigenes" clade, agrobacteria also occur within other Rhizobium clades. In particular, the "tropicirhizogenes" clade comprises at least two additional species that include tumorigenic strains. The first one corresponds to Rhizobium sp.
AB2/73 which was isolated from Lippia canescens in the USA (Anderson & Moore, 1979). Recently, Hooykaas and Hooykaas (2021) suggested that this strain belongs to a novel Rhizobium species, which is closely related to Rhizobium dioscoreae according to our results. The second putative species includes Ti plasmid-carrying strains 17-2069-2b and 17-2069-2c isolated from blackberry (Weisberg et al., 2022). The closest relative of this potentially novel species is Rhizobium hainanense (Appendix: Figure A5,

| Genome architecture of the "tumorigenes" clade
The large number of extrachromosomal elements in members of the "tumorigenes" clade is not surprising as the genomes of nearly all members of the family Rhizobiaceae consist of a multipartite architecture (Geddes et al., 2020), with R. leguminosarum Rlv3841 containing six extrachromosomal replicons between 151 and 870 kb (Young et al., 2006). Nonchromosomal replicons vary in size and essentiality. While classification systems exist to classify replicons into distinct classes (i.e., plasmid, megaplasmid, chromid), it has been argued that these groups of replicons belong to a spectrum with blurred boundaries (diCenzo & Finan, 2017;Hall et al., 2022).
We agree with this perspective; yet, we also consider that the classification of replicons into distinct groups can nevertheless be useful, in some circumstances, to quickly convey the general properties of a replicon of interest.
The genomes of both R. tumorigenes strains are split across six replicons: one chromosome, three putative chromids, and two megaplasmids that includes a Ti plasmid. Phylogenetic analysis indicated that the five extrachromosomal replicons of R. tumorigenes 1078 T had a corresponding replicon in strain 932, as well as corresponding replicons or regions in R. rhododendri rho-6.2 T . In contrast, the related organism R. tubonense CCBAU 85046 T appears F I G U R E 5 Clustered heatmap of core-proteome average amino-acid identity (cpAAI) values between the members of the clade "tumorigenes" and other Rhizobiaceae clades. Three Mesorhizobium spp. strains were included as the outgroup. cpAAI values were computed from a reference set of 169 protein markers defined in our previous study (Kuzmanović, Fagorzi, et al., 2022). Although the original data set included 170 protein markers, one marker gene was missing in Onobrychidicola muellerharveyae TH2 T , and we, therefore, excluded this marker from the analysis. cpAAI values were clustered using the core-proteome phylogeny of Figure 3 and Appendix: Figure A5. The "tumorigenes" clade and other Rhizobiaceae clades are indicated with red boxes.
to have two extrachromosomal replicons based on our RepA analysis (Appendix: Figure A4); however, neither appeared to be orthologous to any of the extrachromosomal replicons of R. tumorigenes 1078 T .
We thus conclude that the five extrachromosomal replicons of R. tumorigenes were acquired by an ancestor after the split from R. tubonense but before the split from R. rhododendri.
Of the five extrachromosomal replicons of R. tumorigenes, three were classified as putative chromids according to the sequence- R. rhododendri rho-6.2 T contains two fewer extrachromosomal replicons than do R. tumorigenes strains 1078 T and 932. Synteny and phylogenetic analyses indicated that this is due to a co-integration of the megaplasmid and two putative chromids in the R. rhododendri

| Diversification of "tumorigenes" clade
In this study, we identified genes specific for each of the two species the R. rhododendri and R. tumorigenes by examining the pan-genome of the "tumorigenes" clade. Our objective was to identify potentially adaptive features among species-specific genes, to gain a better understanding of the ecological differentiation of these species. We recognize, however, that the availability of genomes for only two R.
tumorigenes strains is a limitation of this analysis, and that the number of species-specific genes will likely decrease as more genomes become available. Nevertheless, based on the available genomes, the majority of species-specific genes are encoded on putative chromids and megaplasmids, which is in line with previous studies analyzing the A. tumefaciens species complex (Lassalle et al., 2017) and All. vitis species complex (Kuzmanović, Biondi, et al., 2022) strains. Although most of the species-specific genes are annotated as encoding hypothetical or poorly described proteins, we could determine putative functions for several genes and gene clusters.
Both R. rhododendri and R. tumorigenes strains carried a putative gene cluster involved in the production of cellulose; however, the former species carried an additional cluster with the same putative function. Both clusters were homologous, but divergent in sequence. Agrobacterium and Rhizobium spp. were reported to synthesize cellulose (reviewed in Augimeri et al., 2015;Ross et al., 1991). In Agrobacterium spp., production of the exopolysaccharide cellulose is associated with the attachment of bacteria to plant surfaces (Matthysse et al., 1981). Although cellulose synthesis was not required for the virulence of Agrobacterium, cellulose mutants could not firmly attach to a host plant, which reduced tumor formation (Matthysse, 1983). Similarly, in R. leguminosarum, cellulose production is involved in rhizobial attachment to plant roots (Smit et al., 1987).
R. rhododendri carried genes putatively associated with the uptake of simple sugars and sugar alcohols, such as galactitol and sorbitol, that were absent in R. tumorigenes, suggesting that the catabolic capacity of R. rhododendri and R. tumorigenes differs.
These two sugar alcohols, in addition to mannitol, are widely distributed in angiosperms where they may be involved in response to abiotic and biotic stresses (Moing, 2000). The potential ability of R. rhododendri to process these compounds could contribute to its environmental adaptation and association with higher plants.
On putative chromid 1, R. tumorigenes carried a putative gene cluster associated with T6SS. Homologous genes were not identified in any of the R. rhododendri strains. The T6SS is commonly found in plant-associated bacteria and can have diverse genetic architecture (Bernal et al., 2018). A putative gene cluster encoding T6SS in R.
Accordingly, a T6SS in R. tumorigenes might contribute to its competitiveness in plant tissue or rhizosphere. with nodulation efficiency in some hosts (Le Quéré et al., 2006;Margaret et al., 2012). Therefore, it is tempting to speculate that the synthesis of pseudaminic acid might be involved in tumorigenesis of R. tumorigenesis and plant host invasion.
| 15 of 29 R. tumorigenes strains carried putative chromid-borne gene coding for polygalacturonase, one of the most important enzymes associated with cell wall degradation. It has been reported that All.
vitis species complex strains can produce this enzyme, which plays a role in grapevine root decay (McGuire et al., 1991;Rodriguez-Palenzuela et al., 1991). Different rhizobia such as R. leguminosarum and Sinorhizobium meliloti were also reported to produce polygalacturonase, for which it was postulated to be involved in the root invasion process (Jimenéz-Zurdo et al., 1996). Accordingly, this putative feature in R. tumorigenes could also have a role in the degradation of the pectin network that comprises plant cell walls and colonization of particular plant hosts.
Species R. tumorigenes and R. rhododendri do not carry genes associated with nodulation (e.g., nodA and nodC) or nitrogen fixation (e.g., nifH), suggesting that they do not have N 2 -fixing symbiotic capacities.

| Differentiation of novel Rhizobiaceae genera
Based on genus demarcation thresholds defined in our previous study (Kuzmanović, Fagorzi, et al., 2022), the core Rhizobium superclade should be split into at least three genera. Besides the clade "leguminosarum-etli" (Rhizobium sensu stricto), which includes the type species of the genus Rhizobium (R. leguminosarum), clades "tropici-rhizogenes," and "tumorigenes" represent candidates for new Rhizobiaceae genera. In our opinion, such a division of Rhizobium species would require additional genomic or phenotypic evidence, thus revealing factors relevant to the biological and ecological diversification of these clades. However, this taxonomic revision was not an objective of this work, and we followed the taxonomic scheme preserving the current structure of the genus

Rhizobium.
The taxonomic position of R. tubonense was not completely clear.
This species has relatively high proteome relatedness with both "tumorigenes" and "tropici-rhizogenes" clade representatives. For instance, cpAAI comparisons based on 169 marker proteins yielded values slightly above the threshold for genus demarcation (~86%) in both cases (Kuzmanović, Fagorzi, et al., 2022). In core-proteome and pan-genome phylogenetic trees, R. tubonense was located on a distant branch, although the "tumorigenes" clade was its closest relative. Taken together, R. tubonense might represent an additional candidate for a separate Rhizobium genus. Nonetheless, this requires further study, including additional phylogenetic lineages more closely related to R. tubonense, which are expected to be discovered in the future.

| CONCLUSIONS
This study revealed additional genomic and taxonomic diversity of tumorigenic agrobacteria. OGRIs and phylogenomic analyses clearly showed that tumorigenic strains isolated from rhododendron represent a novel species of the genus Rhizobium for which the name R. rhododendri sp. nov. is proposed. By searching GenBank, additional R. rhododendri strains isolated from blueberry and Himalayan blackberry in the United States were identified. Both species of the "tumorigenes" clade (R. rhododendri and R. tumorigenes) contain multipartite genomes, including a chromosome, putative chromids, and megaplasmids. Interestingly, these two species showed distinct genome architecture. Our investigation indicated that the large putative chromid of R. rhododendri is a cointegrant of an R.
Moreover, evidence of interreplicon DNA exchange between putative chromids of one R. tumorigenes lineage was detected.
Furthermore, we examined the pan-genome of members of the  F I G U R E A1 Eckhardt-type agarose gel electrophoresis of extrachromosomal replicons of Rhizobium rhododendri rho-6.2 T , and Rhizobium tumorigenes 1078 T and 932. Although finished genomes were not obtained for R. rhododendri strains rho-1.1, rho-3.1, rho-4.1, rho-13.1, and rho-14.3 in this study, they were also examined by gel electrophoresis. "Agrobacterium fabrum" C58 T , Allorhizobium ampelinum S4 T , and Rhizobium rhizogenes K84 carrying replicons of known size were used as markers.
F I G U R E A2 Comparison of chromosome (a), chromid 1 (b), and chromid 2 (c) of Rhizobium rhododendri rho-6.2 T (reference sequences) with the orthologous replicons of related R. rhododendri and Rhizobium tumorigenes strains (query sequences). Genetic coordinates of the reference sequences are shown within the thin inner ring. The thick black rings portray corresponding replicons of R rhododendri rho-6.2 T . The colored rings portray query sequences, as indicated in the Figure. The accession numbers of sequences used for comparison are indicated in Table  A3. The Figure was generated using BRIG software and edited with Inkscape (see M& M for details).
F I G U R E A3 Maximum likelihood tree based on the concatenated sequence of RepA and RepB proteins identified in genomes of Rhizobium rhododendri rho-6.2 T , and Rhizobium tumorigenes 1078 T and 932. Corresponding strains are indicated in parentheses or within replicon names. The tree was constructed using model LG + G4. The numbers on the nodes indicate the SH-aLRT support values (first value) and ultra-fast bootstrap values (second value). The tree was midpoint rooted. The scale bar represents the estimated number of amino acid substitutions per site. The trees based on individual RepA and RepB sequences showed identical topology to the tree shown here. F I G U R E A7 Clustered heatmap of average nucleotide identity (ANI) values between the members of the clade "tumorigenes" and other Rhizobiaceae clades. Three Mesorhizobium spp. strains were included as the outgroup. The ANI calculations were performed using ANIb, OrthoANIu, and FastANI methods. The only incongruence between these three methods was observed for strains Rhizobium favelukesii LPU83 T and Rhizobium tibeticum CCBAU 85039 T , for which the ANIb and orthoANIu values were ≥96%, while the fastANI value was at the threshold for species delineation (95-95.99%), as indicated by orange circles on the heatmap. ANI values were clustered using the coreproteome phylogeny of Figure 3 and A5. The "tumorigenes" clade and its species are indicated with blue boxes.
F I G U R E A6 Maximum-likelihood pan-genome phylogeny showing the relationships between and within the clade "tumorigenes" and other Rhizobiaceae clades (uncollapsed). Three Mesorhizobium spp. strains were included as the outgroup to root the tree. The tree was estimated with IQ-TREE from the consensus (COGtriangles and OMCL clusters) gene presence/absence matrix containing 71,538 clusters obtained using GET_HOMOLOGUES software. The numbers on the nodes indicate the approximate Bayesian posterior probabilities support values (first value) and ultra-fast bootstrap values (second value), as implemented in IQ-TREE. The scale bar represents the number of expected substitutions per site under the best-fitting GTR2 + FO + R5 model. The same tree, but without collapsing clades, is presented in Figure 4.
F I G U R E A8 Clustered heatmap of digital DNA-DNA hybridization (dDDH) values between the members of the clade "tumorigenes" and other Rhizobiaceae clades. Three Mesorhizobium spp. strains were included as the outgroup. dDDH values were clustered using the coreproteome phylogeny of Figures 3 and A5. The "tumorigenes" clade and its species are indicated with blue boxes.
F I G U R E A10 Clustered heatmap of core-proteome average amino-acid identity (cpAAI) values between the members of the clade "tumorigenes" and other Rhizobiaceae clades. Three Mesorhizobium spp. strains were included as the outgroup. cpAAI values were computed from 191 marker proteins selected by GET_PHYLOMARKERS software from the dataset used in this study. These 191 marker proteins were employed to infer core-proteome phylogeny shown in Figures 3 and A5. The core-proteome phylogeny was used to cluster cpAAI values. The "tumorigenes" clade and other Rhizobiaceae clades are indicated with red boxes.
F I G U R E A11 Cumulative GC skew, showing using a 10-kb sliding window, for putative chromid 1 of R. rhododendri rho-6.2 T . The X-axis represents the nucleotide position, with the Y-axis showing the cumulative GC skew. Values were calculated using an in-house script available at https://github.com/diCenzo-Lab/007_2023_Rhizobium_ rhododendri.